Efficient Learning of Relational Models for Sequential Decision Making

نویسندگان

  • Michael L. Littman
  • Thomas J. Walsh
چکیده

OF THE DISSERTATION Efficient Learning of Relational Models for Sequential Decision Making by Thomas J. Walsh Dissertation Director: Michael L. Littman The exploration-exploitation tradeoff is crucial to reinforcement-learning (RL) agents, and a significant number of sample complexity results have been derived for agents in propositional domains. These results guarantee, with high probability, near-optimal behavior in all but a polynomial number of timesteps in the agent’s lifetime. In this work, we prove similar results for certain relational representations, primarily a class we call “relational action schemas”. These generalized models allow us to specify state transitions in a compact form, for instance describing the effect of picking up a generic block instead of picking up 10 different specific blocks. We present theoretical results on crucial subproblems in action-schema learning using the KWIK framework, which allows us to characterize the sample efficiency of an agent learning these models in a reinforcement-learning setting. These results are extended in an apprenticeship learning paradigm where and agent has access not only to its environment, but also to a teacher that can demonstrate traces of state/action/state sequences. We show that the class of action schemas that are efficiently learnable in this paradigm is strictly larger than those learnable in the online setting. We link the class of efficiently learnable dynamics in the apprenticeship setting to a rich class of models derived from well-known learning frameworks. As an application, we present theoretical and empirical results on learning relational models of web-service descriptions using a dataflow model called a Task Graph to capture the important

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking Efficient Decision Making Units Using Cooperative Game Theory Based on SBM Input-Oriented Model and Nucleolus Value

In evaluating the efficiency of decision making units (DMUs) by Data Envelopment Analysis (DEA) models, may be more than one DMU has an efficiency score equal to one. Since ranking of efficient DMUs is essential for decision makers, therefore, methods and models for this purpose are presented. One of ranking methods of efficient DMUs is cooperative game theory. In this study, Lee and Lozano mod...

متن کامل

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...

متن کامل

Convergence in a sequential two stages decision making process

We analyze a sequential decision making process, in which at each stepthe decision is made in two stages. In the rst stage a partially optimalaction is chosen, which allows the decision maker to learn how to improveit under the new environment. We show how inertia (cost of changing)may lead the process to converge to a routine where no further changesare made. We illustrate our scheme with some...

متن کامل

Efficient Approximate Policy Iteration Methods for Sequential Decision Making in Reinforcement Learning

(Computer Science—Machine Learning) EFFICIENT APPROXIMATE POLICY ITERATION METHODS FOR SEQUENTIAL DECISION MAKING IN REINFORCEMENT LEARNING

متن کامل

Ranking of Efficient and Non-Efficient Decision Making Units with Undesirable Data Based on Combined Models of DEA and TOPSIS

Data Envelopment Analysis (DEA) is a method for determining the performance of units under evaluation of DMUs. Each decision-making unit using multiple inputs produces multiple outputs whose nature of outputs may be desirable or undesirable. Units whose performance score equals one are efficient. The concept of ranking decision makers because of the useful information they provide to decision m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010